RNA-Seq Data Analysis    ◾    205

#Removing rows without Entrez ids

i <- is.na(y$genes$ENTREZID)

y <- y[!i, ]

#Creating the design matrix

condition <- factor(sampleinfo$condition)

design <- model.matrix(~ 0 + condition)

#Filtering genes with low abundance

keep <- filterByExpr(y, design)

y <- y[keep, , keep.lib.sizes=FALSE]

# Normalizing count data

yNorm <- calcNormFactors(y)

#Estimating dispersions:

yNorm <- estimateDisp(yNorm, design)

Once you have run the above script successfully without any error, then you can use the

“vidger” functions to create plots as follows.

FIGURE 5.30  Box plots showing the distribution of normal and tumor counts in CPM.